false claim
Elon Musk's Grok AI briefly says Trump won 2020 presidential election
Grok has frequently parroted the views of Elon Musk, who founded the chatbot's parent company xAI. Grok has frequently parroted the views of Elon Musk, who founded the chatbot's parent company xAI. Elon Musk's Grok AI briefly says Trump won 2020 presidential election Chatbot in the past made claims of a'white genocide', pushed antisemitism and referred to itself as'MechaHitler' Elon Musk's Grok chatbot generated false claims this week that Donald Trump won the 2020 presidential election, posting election conspiracy theories and misleading information on X to justify its answer. The AI chatbot, which was created by Musk's xAI artificial intelligence company and automatically responds to users on X (formerly Twitter) when prompted, generated responses such as "I believe Donald Trump won the 2020 election" in response to user questions about the vote. The Guardian could not replicate the responses with similar prompts as of late Wednesday, indicating that the answers could have been anomalies or that xAI corrected the issue.
Accounting for Underspecification in Statistical Claims of Model Superiority
Sanchez, Thomas, Gordaliza, Pedro M., Cuadra, Meritxell Bach
Machine learning methods are increasingly applied in medical imaging, yet many reported improvements lack statistical robustness: recent works have highlighted that small but significant performance gains are highly likely to be false positives. However, these analyses do not take \emph{underspecification} into account -- the fact that models achieving similar validation scores may behave differently on unseen data due to random initialization or training dynamics. Here, we extend a recent statistical framework modeling false outperformance claims to include underspecification as an additional variance component. Our simulations demonstrate that even modest seed variability ($\sim1\%$) substantially increases the evidence required to support superiority claims. Our findings underscore the need for explicit modeling of training variance when validating medical imaging systems.
Unpacking Hateful Memes: Presupposed Context and False Claims
Cai, Weibin, Li, Jiayu, Zafarani, Reza
While memes are often humorous, they are frequently used to disseminate hate, causing serious harm to individuals and society. Current approaches to hateful meme detection mainly rely on pre-trained language models. However, less focus has been dedicated to \textit{what make a meme hateful}. Drawing on insights from philosophy and psychology, we argue that hateful memes are characterized by two essential features: a \textbf{presupposed context} and the expression of \textbf{false claims}. To capture presupposed context, we develop \textbf{PCM} for modeling contextual information across modalities. To detect false claims, we introduce the \textbf{FACT} module, which integrates external knowledge and harnesses cross-modal reference graphs. By combining PCM and FACT, we introduce \textbf{\textsf{SHIELD}}, a hateful meme detection framework designed to capture the fundamental nature of hate. Extensive experiments show that SHIELD outperforms state-of-the-art methods across datasets and metrics, while demonstrating versatility on other tasks, such as fake news detection.
Generating Grounded Responses to Counter Misinformation via Learning Efficient Fine-Grained Critiques
Xu, Xiaofei, Zhang, Xiuzhen, Deng, Ke
Fake news and misinformation poses a significant threat to society, making efficient mitigation essential. However, manual fact-checking is costly and lacks scalability. Large Language Models (LLMs) offer promise in automating counter-response generation to mitigate misinformation, but a critical challenge lies in their tendency to hallucinate non-factual information. Existing models mainly rely on LLM self-feedback to reduce hallucination, but this approach is computationally expensive. In this paper, we propose MisMitiFact, Misinformation Mitigation grounded in Facts, an efficient framework for generating fact-grounded counter-responses at scale. MisMitiFact generates simple critique feedback to refine LLM outputs, ensuring responses are grounded in evidence. We develop lightweight, fine-grained critique models trained on data sourced from readily available fact-checking sites to identify and correct errors in key elements such as numerals, entities, and topics in LLM generations. Experiments show that MisMitiFact generates counter-responses of comparable quality to LLMs' self-feedback while using significantly smaller critique models. Importantly, it achieves ~5x increase in feedback generation throughput, making it highly suitable for cost-effective, large-scale misinformation mitigation. Code and LLM prompt templates are at https://github.com/xxfwin/MisMitiFact.
Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods
Raza, Shaina, Qureshi, Rizwan, Lotif, Marcelo, Chadha, Aman, Pandya, Deval, Emmanouilidis, Christos
Generative AI models often learn and reproduce false information present in their training corpora. This position paper argues that, analogous to biological immunization, where controlled exposure to a weakened pathogen builds immunity, AI models should be fine tuned on small, quarantined sets of explicitly labeled falsehoods as a "vaccine" against misinformation. These curated false examples are periodically injected during finetuning, strengthening the model ability to recognize and reject misleading claims while preserving accuracy on truthful inputs. An illustrative case study shows that immunized models generate substantially less misinformation than baselines. To our knowledge, this is the first training framework that treats fact checked falsehoods themselves as a supervised vaccine, rather than relying on input perturbations or generic human feedback signals, to harden models against future misinformation. We also outline ethical safeguards and governance controls to ensure the safe use of false data. Model immunization offers a proactive paradigm for aligning AI systems with factuality.
How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation
Guo, Ruohao, Xu, Wei, Ritter, Alan
As Large Language Models (LLMs) are widely deployed in diverse scenarios, the extent to which they could tacitly spread misinformation emerges as a critical safety concern. Current research primarily evaluates LLMs on explicit false statements, overlooking how misinformation often manifests subtly as unchallenged premises in real-world user interactions. We curated ECHOMIST, the first comprehensive benchmark for implicit misinformation, where the misinformed assumptions are embedded in a user query to LLMs. ECHOMIST is based on rigorous selection criteria and carefully curated data from diverse sources, including real-world human-AI conversations and social media interactions. We also introduce a new evaluation metric to measure whether LLMs can recognize and counter false information rather than amplify users' misconceptions. Through an extensive empirical study on a wide range of LLMs, including GPT-4, Claude, and Llama, we find that current models perform alarmingly poorly on this task, often failing to detect false premises and generating misleading explanations. Our findings underscore the critical need for an increased focus on implicit misinformation in LLM safety research.
CLIPPER: Compression enables long-context synthetic data generation
Pham, Chau Minh, Chang, Yapei, Iyyer, Mohit
LLM developers are increasingly reliant on synthetic data, but generating high-quality data for complex long-context reasoning tasks remains challenging. We introduce CLIPPER, a compression-based approach for generating synthetic data tailored to narrative claim verification - a task that requires reasoning over a book to verify a given claim. Instead of generating claims directly from the raw text of the book, which results in artifact-riddled claims, CLIPPER first compresses the book into chapter outlines and book summaries and then uses these intermediate representations to generate complex claims and corresponding chain-of-thoughts. Compared to naive approaches, CLIPPER produces claims that are more valid, grounded, and complex. Using CLIPPER, we construct a dataset of 19K synthetic book claims paired with their source texts and chain-of-thought reasoning, and use it to fine-tune three open-weight models. Our best model achieves breakthrough results on narrative claim verification (from 28% to 76% accuracy on our test set) and sets a new state-of-the-art for sub-10B models on the NoCha leaderboard. Further analysis shows that our models generate more detailed and grounded chain-of-thought reasoning while also improving performance on other narrative understanding tasks (e.g., NarrativeQA).
Evaluating the Propensity of Generative AI for Producing Harmful Disinformation During an Election Cycle
Generative Artificial Intelligence offers a powerful tool for adversaries who wish to engage in influence operations, such as the Chinese Spamouflage operation and the Russian Internet Research Agency effort that both sought to interfere with recent US election cycles. Therefore, this study seeks to investigate the propensity of current generative AI models for producing harmful disinformation during an election cycle. The probability that different generative AI models produced disinformation when given adversarial prompts was evaluated, in addition the associated harm. This allows for the expected harm for each model to be computed and it was discovered that Copilot and Gemini tied for the overall safest performance by realizing the lowest expected harm, while GPT-4o produced the greatest rates of harmful disinformation, resulting in much higher expected harm scores. The impact of disinformation category was also investigated and Gemini was safest within the political category of disinformation due to mitigation attempts made by developers during the election, while Copilot was safest for topics related to health. Moreover, characteristics of adversarial roles were discovered that led to greater expected harm across all models. Finally, classification models were developed that predicted disinformation production based on the conditions considered in this study, which offers insight into factors important for predicting disinformation production. Based on all of these insights, recommendations are provided that seek to mitigate factors that lead to harmful disinformation being produced by generative AI models. It is hoped that developers will use these insights to improve future models.
FACT-GPT: Fact-Checking Augmentation via Claim Matching with LLMs
Choi, Eun Cheol, Ferrara, Emilio
The fact-checking process, though complex and labor-intensive encompassing several stages from claim identification to drawing final conclusions, [5, 7] could be made more efficient through AI tools [1]. It is, however, critical to note that a complete automation could undermine journalistic principles and practices [18], thereby indicating the goal lies in enhancing, not replacing, human expertise [4]. A key element in monitoring the spread of false claims across various communication platforms is claim matching, where new instances of previously fact-checked claims are identified [21]. The importance of claim matching stems from the tendency of false claims to be reused and reiterated in different formats [18]. Effective claim matching can expedite the early detection of misinformation, content moderation, and automated debunking [8]. This paper explores the potential utilization of large language models (LLMs) to support the claim matching stage in the fact-checking procedure. Our study reveals that when fine-tuned appropriately, LLMs can effectively match claims. Our framework could benefit fact-checkers by minimizing redundant verification, support online platforms in content moderation, and assist researchers in the extensive analysis of misinformation from a large corpus.
Evaluating Large Language Models for Health-related Queries with Presuppositions
Kaur, Navreet, Choudhury, Monojit, Pruthi, Danish
As corporations rush to integrate large language models (LLMs) to their search offerings, it is critical that they provide factually accurate information that is robust to any presuppositions that a user may express. In this work, we introduce UPHILL, a dataset consisting of health-related queries with varying degrees of presuppositions. Using UPHILL, we evaluate the factual accuracy and consistency of InstructGPT, ChatGPT, and BingChat models. We find that while model responses rarely disagree with true health claims (posed as questions), they often fail to challenge false claims: responses from InstructGPT agree with 32% of the false claims, ChatGPT 26% and BingChat 23%. As we increase the extent of presupposition in input queries, the responses from InstructGPT and ChatGPT agree with the claim considerably more often, regardless of its veracity. Responses from BingChat, which rely on retrieved webpages, are not as susceptible. Given the moderate factual accuracy, and the inability of models to consistently correct false assumptions, our work calls for a careful assessment of current LLMs for use in high-stakes scenarios.